Generic Loop Parallelization for Reconfigurable Architectures
نویسندگان
چکیده
Reconfigurable Computing (RC) is one of the most intensively studied research areas nowadays due to its potential to dramatically increase application performance. RC combines a general purpose processor (GPP) and a Field Programmable Gate Array (FPGA), having the advantages of both hardware performance and software flexibility. Modern real-life applications (such as audio, video, image processing, etc) spend most of the execution time in loops, which represent or include the application kernels. These loops are an important source of performance improvement. In our work, we target loops that contain in their bodies code for the GPP (software functions) and also for the FPGA (hardware functions). We assume there are data dependencies between consecutive tasks in the loop body, but not between different loop iterations. Assuming the Molen machine organization as our framework, we focus on applying existing loop optimizations to such loops, with the purpose of parallelizing applications such that multiple kernel instances run in parallel on the reconfigurable hardware, while concurrently executing code on the GPP. In this paper, we focus on loop transformations that are suitable for loops containing an arbitrary number of software and hardware functions. The extended shifting consists of relocating the functions placed in the beginning and in the end of one loop iteration, in order to eliminate the data dependencies and allow certain software and hardware functions to be executed in parallel. The loop distribution consists of splitting the loop into small loops (e.g., with only one kernel) allowing in some cases a larger degree of parallelism when applying the loop unrolling and shifting techniques. We estimate the performance achieved by applying the extended shifting technique in conjunction with loop unrolling and compare it to the performance achieved when applying the loop unrolling and shifting techniques to smaller loops obtained by distributing the original loop. For the experimental results we used randomly generated tests, for loops containing a variable number of kernels (between 2 and 8 kernels).
منابع مشابه
Loop Parallelization for Reconfigurable Architectures
Reconfigurable Computing (RC) is one of the research directions that focuses on accelerating applications. In the presented approach we assume the Molen machine organization and the Molen programming paradigm as our framework. Molen combines a general purpose processor (GPP) and a Field Programmable Gate Array (FPGA), having the advantages of both speed of hardware and flexibility of software e...
متن کاملExploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling - Computers and Digital Techniques, IEE Proceedings-
Coarse-grained reconfigurable architectures have become increasingly important in recent years. Automatic design or compilation tools are essential to their success. A modulo scheduling algorithm to exploit loop-level parallelism for coarse-grained reconfigurable architectures is presented. This algorithm is a key part of a dynamically reconfigurable embedded systems compiler (DRESC). It is cap...
متن کاملMapping Loops on Coarse-Grain Reconfigurable Architectures Using Memory Operation Sharing
Recently many coarse-grain reconfigurable architectures have emerged as programmable coprocessors, considerably relieving the burden of the main processors in many multimedia applications. While their very high degree of parallelism enables high performance in compute-intensive loops, their shared memory interface between several processing elements often becomes a bottleneck in many multimedia...
متن کاملEnabling Parallelization via a Reconfigurable Chip Multiprocessor
While reconfigurable computing has traditionally involved attaching a reconfigurable fabric to a single processor core, the prospect of large-scale CMPs calls for a reevaluation of reconfigurable computing from the perspective of multicore architectures. We present ReMAPP, a reconfigurable architecture geared towards application acceleration and parallelization. In ReMAPP, parallel threads shar...
متن کاملEvaluating Memory Architectures for Media Applications on Coarse-Grained Recon.gurable Architectures
Reconfigurable ALU Array (RAA) architectures—representing a popular class of Coarse-grained Reconfigurable Architectures—are gaining in popularity especially for media applications due to their flexibility, regularity, and efficiency. In such architectures, memory is critical not only for configuration data but also for the heavy data traffic required by the application. Hence, system designers...
متن کامل